965 research outputs found
Deep Learning Face Attributes in the Wild
Predicting face attributes in the wild is challenging due to complex face
variations. We propose a novel deep learning framework for attribute prediction
in the wild. It cascades two CNNs, LNet and ANet, which are fine-tuned jointly
with attribute tags, but pre-trained differently. LNet is pre-trained by
massive general object categories for face localization, while ANet is
pre-trained by massive face identities for attribute prediction. This framework
not only outperforms the state-of-the-art with a large margin, but also reveals
valuable facts on learning face representation.
(1) It shows how the performances of face localization (LNet) and attribute
prediction (ANet) can be improved by different pre-training strategies.
(2) It reveals that although the filters of LNet are fine-tuned only with
image-level attribute tags, their response maps over entire images have strong
indication of face locations. This fact enables training LNet for face
localization with only image-level annotations, but without face bounding boxes
or landmarks, which are required by all attribute recognition works.
(3) It also demonstrates that the high-level hidden neurons of ANet
automatically discover semantic concepts after pre-training with massive face
identities, and such concepts are significantly enriched after fine-tuning with
attribute tags. Each attribute can be well explained with a sparse linear
combination of these concepts.Comment: To appear in International Conference on Computer Vision (ICCV) 201
Distributed Estimation and Inference with Statistical Guarantees
This paper studies hypothesis testing and parameter estimation in the context
of the divide and conquer algorithm. In a unified likelihood based framework,
we propose new test statistics and point estimators obtained by aggregating
various statistics from subsamples of size , where is the sample
size. In both low dimensional and high dimensional settings, we address the
important question of how to choose as grows large, providing a
theoretical upper bound on such that the information loss due to the divide
and conquer algorithm is negligible. In other words, the resulting estimators
have the same inferential efficiencies and estimation rates as a practically
infeasible oracle with access to the full sample. Thorough numerical results
are provided to back up the theory
Relighting4D: Neural Relightable Human from Videos
Human relighting is a highly desirable yet challenging task. Existing works
either require expensive one-light-at-a-time (OLAT) captured data using light
stage or cannot freely change the viewpoints of the rendered body. In this
work, we propose a principled framework, Relighting4D, that enables
free-viewpoints relighting from only human videos under unknown illuminations.
Our key insight is that the space-time varying geometry and reflectance of the
human body can be decomposed as a set of neural fields of normal, occlusion,
diffuse, and specular maps. These neural fields are further integrated into
reflectance-aware physically based rendering, where each vertex in the neural
field absorbs and reflects the light from the environment. The whole framework
can be learned from videos in a self-supervised manner, with physically
informed priors designed for regularization. Extensive experiments on both real
and synthetic datasets demonstrate that our framework is capable of relighting
dynamic human actors with free-viewpoints.Comment: ECCV 2022; Project Page
https://frozenburning.github.io/projects/relighting4d Codes are available at
https://github.com/FrozenBurning/Relighting4
Semantic Image Segmentation via Deep Parsing Network
This paper addresses semantic image segmentation by incorporating rich
information into Markov Random Field (MRF), including high-order relations and
mixture of label contexts. Unlike previous works that optimized MRFs using
iterative algorithm, we solve MRF by proposing a Convolutional Neural Network
(CNN), namely Deep Parsing Network (DPN), which enables deterministic
end-to-end computation in a single forward pass. Specifically, DPN extends a
contemporary CNN architecture to model unary terms and additional layers are
carefully devised to approximate the mean field algorithm (MF) for pairwise
terms. It has several appealing properties. First, different from the recent
works that combined CNN and MRF, where many iterations of MF were required for
each training image during back-propagation, DPN is able to achieve high
performance by approximating one iteration of MF. Second, DPN represents
various types of pairwise terms, making many existing works as its special
cases. Third, DPN makes MF easier to be parallelized and speeded up in
Graphical Processing Unit (GPU). DPN is thoroughly evaluated on the PASCAL VOC
2012 dataset, where a single DPN model yields a new state-of-the-art
segmentation accuracy.Comment: To appear in International Conference on Computer Vision (ICCV) 201
- …